Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

نویسندگان

Saba Q. Yahyaa

Madalina M. Drugan

Bernard Manderick

چکیده

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent’s goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the two variant, LS-KG across arms finds the optimal arm set, while LS-KG across dimensions plays fairly the optimal arms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to fin...

متن کامل

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem

The multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single scalar reward. Moreover, these multiple rewards might be conflicting. The MOMAB-problem has a set of Pareto optimal arms and an agent’s goal is not only to find that set but also to play evenly or fairly the arms in that set....

متن کامل

Multi-Objective Reinforcement Learning

In multi-objective reinforcement learning (MORL) the agent is provided with multiple feedback signals when performing an action. These signals can be independent, complementary or conflicting. Hence, MORL is the process of learning policies that optimize multiple criteria simultaneously. In this abstract, we briefly describe our extensions to single-objective multi-armed bandits and reinforceme...

متن کامل

Pareto Adaptive Decomposition algorithm

Dealing with multi-objective combinatorial optimization and local search, this article proposes a new multi-objective meta-heuristic named Pareto Adaptive Decomposition algorithm (PAD). Combining ideas from decomposition methods, two phase algorithms and multi-armed bandit, PAD provides a 2-phase modular framework for finding an approximation of the Pareto front. The first phase decomposes the ...

متن کامل

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem

نویسندگان

چکیده

منابع مشابه

Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem

Multi-Objective Reinforcement Learning

Pareto Adaptive Decomposition algorithm

Budgeted Bandit Problems with Continuous Random Costs

عنوان ژورنال:

اشتراک گذاری